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AN IMPLEMENTATION OF THE QMR METHOD 
BASED ON COUPLED TWO-TERM RECURRENCES* 

ROLAND W. FREUND* AND NOEL M. NACHTIGAL* 

Abstract. Recently, the authors have proposed a new Krylov subspace iteration, the quasi-minimal 
residual algorithm (QMR), for solving non-Hermitian linear systems. In the original implementation 
of the QMR method, the Lanczos process with look-ahead is used to generate basis vectors for the 
underlying Krylov subspaces. In the Lanczos algorithm, these basis vectors are computed by means 
of three-term recurrences. It has been observed that, in finite precision arithmetic, vector iterations 
based on three-term recursions are usually less robust than mathematically equivalent coupled two-term 
vector recurrences. 

This paper presents a look-ahead algorithm that constructs the Lanczos basis vectors by means 
of coupled two-term recursions. Implementation details are given, and the look-ahead strategy is 
described. A new implementation of the QMR method, based on this coupled two-term algorithm, 
is proposed. A simplified version of the QMR algorithm without look-ahead is also presented, and 
the special case of QMR for complex symmetric linear systems is considered. Results of numerical 
experiments comparing the original and the new implementations of the QMR method are reported. 

Key words. Krylov subspace iteration, quasi-minimal residual method, non-Hermitian matrices, 
coupled two-term recurrences, look-ahead techniques, complex symmetric matrices 

AMS(MOS) subject classifications. 65F10, 65N22 


1. In troduction. Recently, we have proposed a new Krylov subspace iteration, 
the quasi-minimal residual algorithm (QMR) [9], for solving general nonsingular non- 
Hermitian systems of linear equations 

(1.1) Ax = b. 

The QMR method is closely related to the classical biconjugate gradient algorithm 
(BCG) due to Lanczos [14]. The BCG method aims at generating approximate so- 
lutions for (1.1) that satisfy a Galerkin condition. Unfortunately, for non-Hermitian 
matrices A , such iterates need not always exist, and this is the source of one of the two 
possible breakdowns — triggered by division by 0 — that can occur during each iteration 
step of BCG. The second breakdown is equivalent to the possible breakdown — also trig- 
gered by division by 0 — of the nonsymmetric Lanczos process [13]. In finite precision 
arithmetic, it is unli kely that one encounters exact breakdowns in the BCG algorithm. 
However, near- breakdowns can occur, which can cause a build-up of round-off in suc- 
cessive iterations. Another problem with BCG is the lack of a residual minimization 

* This work was supported by Cooperative Agreement NCC 2-387 between the National Aeronautics 
and Space Administration (NASA) and the Universities Space Research Association (USRA). 

t AT&T Bell Laboratories, Room 2C-420, 600 Mountain Avenue, P.O. Box 636, Murray Hill, New 
Jersey 07974-0636. This research was performed while this author was in residence at the Research 
Institute for Advanced Computer Science (RIACS), NASA Ames Research Center, Moffett Field, 
California 94035. 

t RJACS, Mail Stop T041-5, NASA Ames Research Center, Moffett Field, California 94035. 
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property for its iterates, which leads to a typically erratic convergence behavior, with 
wild oscillations in the residual norm. 

The QMR method offers remedies for these problems. It generates iterates that are 
defined by a quasi-minimization of the residual norm, rather than a Galerkin condition. 
This eliminates the oscillations and leads to a smooth and nearly monotone convergence 
behavior. In contrast to BCG, a QMR iterate always exists at each iteration step, and 
this excludes breakdowns caused by non-existent iterates. Moreover, possible break- 
downs in the underlying Lanczos process are prevented by using look-ahead techniques. 
Therefore, except for the rare event of an incurable breakdown, breakdowns cannot 
occur in the QMR method. 

In the original QMR algorithm [9], an implementation of the Lanczos method with 
look-ahead is used to generate basis vectors for the underlying Krylov subspaces. In the 
Lanczos process, these basis vectors are generated by means of three- term recurrences. 
It has been observed that, in finite precision arithmetic, vector iterations based on 
three-term recursions are usually less robust than mathematically equivalent coupled 
two-term vector recurrences. In this paper, we present a general look-ahead algorithm 
based on coupled two-term recursions for constructing basis vectors of Krylov subspaces. 
Based on this algorithm we then propose a new implementation of the QMR method. 

The remainder of the paper is organized as follows. In § 2, we briefly review the 
Lanczos process and the original QMR algorithm. In §3, we present a sketch of the 
proposed look-ahead procedure for constructing Lanczos vectors by means of coupled 
two-term recurrences. In §4, we discuss the look-ahead strategy for this algorithm, 
and in §5, we give implementation details. Next we combine the coupled two-term 
procedure with the QMR approach. In § 6, we outline the resulting implementation for 
the general case of QMR with look-ahead. In § 7, we present a simplified version of 
the QMR algorithm without look-ahead. In § 8, we consider a variant of QMR for the 
special case of complex symmetric linear systems. In § 9, we report results of numerical 
experiments comparing the original and the new implementations of the QMR method. 
Finally, in § 10, we make some concluding remarks. 

Throughout the paper, all vectors and matrices are allowed to have real or complex 
entries. As usual, M T = and M H = denote the transpose and the conjugate 

transpose, respectively, of the matrix M = [m jfc ] . We use a (M) and cr mi „{M) for the 

largest and smallest singular value of M, respectively. The vector norm ||x|| := \Jx H x 
is always the Euclidean norm and ||M|| := is the corresponding matrix norm. 

We denote by 

K : = {?(*) = 7o + 7i* + ' • * + 7„A n | 7o> 7 i, • * • , 7 n ^ C} 

the set of all complex polynomials of degree at most n. The nth Krylov subspace of 
C" generated by c 6 C N and the N x N matrix B is defined by 

K n (c, B) := span{c, Be, • • • , £? n-1 c}, 

and we will make use of the faict that 


K n (c,B) = MB)c 


<P € i)- 
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Furthermore, it is always assumed that A is an N x N matrix, singular or nonsingular. 

Finally, we remark that, for complex matrices, there are two equivalent formulations 
of the Lanczos process, using either A T or A H . In this paper, we have chosen the 
formulation with A T , for two reasons. First, it avoids complex conjugation of the scalars 
in some of the recurrence relations, and second, the recursions reduce immediately for 
the special case of complex symmetric matrices. 

2. The QMR algorithm. In this section, we briefly describe the QMR method 
and its original implementation [9]. We remark that, in [9], QMR was proposed for 
nonsingular linear systems (1.1). Freund and Hochbruck [8] showed that the QMR 
method can also be applied to singular square systems, and that it always generates well- 
defined iterates. However, as discussed in [8], these iterates converge to a meaningful 
solution of (1.1) only for consistent systems with coefficient matrices of index 1. An 
important special case for which these conditions are satisfied is consistent singular 
systems with a one-dimensional null space, i.e., 

(2.1) 6 6 {Ax x G C"} and dim{x € | Ax = 0} = 1. 

In this paper, we always consider the QMR method for the general case of N x N linear 
systems, with singular or nonsingular coefficient matrices A. 

2.1. Krylov subspace methods. Let x 0 € be an arbitrary initial guess for 
the linear system (1.1), and denote by r 0 :=b — Ax 0 the corresponding residual vector. 
An iterative scheme for solving ( 1 . 1 ) is called a Krylov subspace method if, for any choice 
of x 0 , it produces approximate solutions of the form 

(2.2) x n 6 x 0 + K n (r 0 ,A), n = l,2,---. 

Clearly, the design of a Krylov subspace algorithm consists of two main parts: the 
construction of suitable basis vectors for the Krylov subspares K n (r 0 , A) in (2.2), and 
the choice of the actual iterates x n . The QMR method is an example of a Krylov 
subspace iteration, where the basis vectors are generated by means of the nonsymmetric 
Lanczos process, and the iterates are characterized by a quasi-minimal residual property. 
Next, we describe these two main ingredients of QMR. 

2.2. The Lanczos process. The Lanczos method is started with two vectors, 

(2.3) v 1 = rjp lt where p x = ||r 0 ||, 

and an arbitrary second starting vector 

(2.4) w x € C N with ||to x || = 1. 

It then produces two sequences of vectors 

(2.5) {vy}i=i “d 
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such that, for n = 1, 2, • • • , 

span{i; 1 ,u 2 , • ■ • ,u n } = K n {v lt A ), 
spanju;^ w 2 , • • • , u; n } = K n (w l , A 7 ), 

and the two sets are biorthogonal or block biorthogonal, i.e., 

(2.7) fV?V„ = D„ 

where D n is a diagonal or block diagonal matrix. Here and in the sequel, we denote by 

(2.8) K : =K v 2 ••• v n] and W n := [u7 x w 2 u> n ] 

the matrices containing the Lanczos vectors {Uj}j = i and as columns. We re- 

mark that the conditions (2.6)-(2.7) determine the vectors (2.5) only up to scaling. 
Throughout this paper, we always scale the Lanczos vectors to have unit length: 

(2.9) IKII = IWI = 1» n = l,2,---. 

The crucial point of the Lanczos process is that vectors satisfying (2.6)-(2.7) can be 
constructed by means of short vector recursions. In the classical Lanczos algorithm [13], 
the vectors are generated using simple three-term recurrences: 


(2 10 ) 5n+1 = AUn ” Vnfin ~ V "~ ll/n ’ 

Pn+1 = ll^n+lll> u n+l = ^n+l/^n+H 

. 2 n , ™n+l = ATw n ~ W nPn ~ W n-l i V nPjtn), 

L+l = ll^n+llh ™n+l = «>n+l/£»+H 

(2.12) where n n = w*AvJwZv n , v n = t n wlvjwl_ x v n _ v 
In this case, the matrix D n in (2.7) is diagonal and nonsingular: 

(2.13) D n = diag(6 lt 6j, • • • , 6 n ), where Sj := wjv, ^ 0. 


Furthermore, we note that, using the notation introduced in (2.8), the recurrence re- 
lations (2.10)— (2.11) for the first n + 1 Lanczos vectors and }y=i can be 

written compactly in matrix form: 


(2.14) AV n = V n+1 H n , 

(2.15) a t w„ = 

Here, H n is an (n + 1 ) x n tridiagonal matrix, and 

, , fl if j = 1, 

(2.16) r n :=diag( 7l , 72 ,---,7n), where ^ if j > 1 , 
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is a diagonal scaling matrix with positive diagonal entries. Finally, for later use, we 
note that all subdiagonal elements of H n are nonzero, and therefore 

(2.17) rank// n = n. 

Unfortunately, in the classical Lanczos algorithm, breakdowns cannot be excluded. 
Indeed, by (2.12), division by 0 will occur during the construction of v n+1 and u; n+1 if 
wlv n = 0, but w n / 0 and v n ^ 0. Parlett, Taylor, and Liu [15] were the first to devise 
a practical modification of the Lanczos procedure that uses look-ahead to skip over 
possible exact breakdowns (w^v n = 0) or near-breakdowns (w^v n is nonzero, but small 
in some sense). The QMR algorithm is based on a different implementation of the look- 
ahead Lanczos method, recently developed by Freund, Gutknecht, and Nachtigal [7]. 
Next, we briefly sketch this look-ahead Lanczos procedure. 

As in the classical Lanczos algorithm, two sequences of Lanczos vectors {u_,}" =1 
and {u>j}” =1 are generated, starting with (2.3) and (2.4). Again, we will use the matrix 
notation V n and VT n defined in (2.8). As before, these vectors satisfy (2.6)— (2.7), but 
now D n is, in general, only a block diagonal matrix, with l := l(n) square blocks of 
dimension hj, j = 1, 2, • • • , l, on the diagonal. More precisely, we have 

(2.18) D n = diag(D (1) , L> (2) , ■ • • , 1 9 (,) ), where := (W (,) ) 2 V (j) . 

Here, the matrices V {j) and W ij \ j = 1, 2, •••,/, are defined by partitioning V n and W n 
into blocks, according to the look-ahead steps taken: 

(2.19) V n = [VW y(2) ... y(0] and W n = [W™ W™ W^\. 

We remark that the matrices V^fi and are of size N x hj, and they contain as 
their columns the Lanczos vectors constructed in the j'th look-ahead step. The integer 
h : is called the length of the jth look-ahead step, and / in (2.18)-(2.19) is the number 
of look-ahead steps that have been performed during the first n steps of the Lanczos 
process. For later use, we introduce some further notation. For j = 1, 2, • • ■ , we denote 
by Uj the index of the first vectors of the blocks and in (2.19); hence, we have 

(2.20) V (i) = {v nj v nj+1 ••• ] and W {i) = [w nj u> nj+1 ••• ]. 

Note that the indices n- satisfy 

(2.21) 1 =: < n 2 < • • • < n, < n < n J+1 . 

The vectors v n and w n are called regular, while the remaining vectors in the blocks 
I/b) an d Wifi are called inner. We remark that, in view of (2.7) and (2.18), the regular 
vectors are biorthogonal to all previous Lanczos vectors, i.e., 

(2.22) w I v n, = w I, v i = 0 for ’ » n j ~ 

while the inner vectors in the j'th blocks V (j) and W u) are biorthogonal to all Lanczos 
vectors from the previous blocks, but not necessarily to the Lanczos vectors in the jth 
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blocks. Finally, we note that, in (2.18), the blocks D^\ j = 1,2, •••,/ — 1, are all 
nonsingular, while the last block is nonsingular if n J+1 = n -f 1 , i.e., if v n+1 and 
u > n+1 are constructed as regular vectors. 

In the look-ahead algorithm, the Lanczos vectors (2.5) are again generated us- 
ing only short vector recurrences, which now involve vectors from the last two blocks 
y(0 5 y('-i) anc l iyW, jy( /_1 ), instead of just v n , v n _ x and w nt u> n _ x , as in the classical 
algorithm. For example, u n+x is computed by means of the relations 

0 n+1 = Av„ - V<V - VV-%„ 

Pn+1 ~ ll^n+llli U n+1 = ^n+l/Pn+l > 

where fi n € C h> and v n € C h ‘~ 1 are suitable coefficient vectors. The second Lanczos 
vector tu n+x is obtained similarly. As before, the recurrence relations for the Lanczos 
vectors can be summarized in the form (2.14)-(2.15). Here, H n is now a block tridiag- 
onal matrix with l square blocks on the diagonal, where the jth block has dimension 
hj x hj, j = 1,2, ••• , In addition, H n is also an upper Hessenberg matrix. Further- 
more, (2.16) and (2.17) remain valid, and p : and £, in (2.16) are scaling factors used to 
ensure the normalization (2.9) of the Lanczos vectors, see (2.23). 

We would like to stress that the look-ahead strategy (see [7] and also § 4 below) is 
such that the algorithm performs mostly standard Lanczos steps, i.e., look-ahead steps 
of length hj = 1 with blocks and that consist of only single Lanczos vectors. 
True look-ahead steps, i.e., steps of size hj > 1 , are only used to avoid exact and near- 
breakdowns. Typically, except for contrived examples, only few true look-ahead steps 
occur, and their size is usually small, mostly hj = 2. We note that, if only steps of 
length hj = 1 are performed, then the look-ahead Lanczos algorithm reduces to the 
classical algorithm. Finally, we remark that so-called incurable breakdowns [18, 12] can 
occur in the Lanczos process. Such breakdowns cannot be remedied by look-ahead, and 
indeed, in such a case, the look-ahead Lanczos algorithm would build a block of size 
Aj = oo. Fortunately, incurable breakdowns are extremely rare, and they do not present 
a problem in practice. 

The look-ahead Lanczos algorithm is intimately connected with formally orthogonal 
polynomials, see, e.g., [ 12 , 7]. In particular, we will use the fact that each pair of Lanczos 
vectors Vj and Wj can be expressed in the form 

(2.24) Vj = <t>j_ x {A)v x and Wj = 7 i ^_ 1 (A T )tu 1 , 

where <f>j_ x € 'Pj-i is of exact degree j — l and 7 ■■ > 0 is defined in (2.16). 

For further details and properties of the look-ahead Lanczos algorithm, we refer the 
reader to [7]. 

2 . 3 . The quasi-minimal residual property. In the QMR method, the vectors 
{uj}"_ x generated by the look-ahead Lanczos algorithm are used as a basis for the 
Krylov subspace K n (r 0 , A) in (2.2). The nth QMR iterate x n is then defined by 

= *0 + K*«» 


(2.23) 


(2.25) 
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where z n € C n is the unique solution of the least squares problem 

(2.26) ll/n +1 - Vn+iHnZj = min ||/ n+1 - n n+1 H n z\\. 

z t v 

Here 

(2.27) /n+i : = “ 1 P 1 * 1 1 0 ••• 0] T € R n+1 , 

with p x given in (2.3), and 

(2.28) fl n+1 : = dia g( w i> u, 2»-‘*> u, n+i)> «; > 0, j = 1,2, •■•,» + 1, 

is an arbitrary diagonal weighting matrix. Note that, in view of (2.17) and (2.28), the 
(n + 1) x n matrix ft n+1 H n has full rank n. This guarantees that there always exists 
a unique solution of (2.26). Furthermore, we remark that the standard choice for the 
weights in (2.28) is 

(2.29) Uj = 1 for all j. 

However, there are instances (see [10]) where the use of different weights is crucial, and 
therefore, we formulate the QMR method in the general setting (2.28). 

From (2.25), (2.14), (2.27), and (2.3), it follows that the residual vector r n := b-Ax n 
corresponding to x n satisfies 

(2.30) r n = V w+1 fi n Jl (/n+l — ^n+l^n Z ») • 

Hence, in view of (2.26), the nth QMR iterate x n is characterized by a minimization 
of the second factor in (2.30); this is just the quasi-minimal residual property. The 
relation (2.30) shows that the scaling (2.29) is very natural, in the sense that all columns 
of V n+i nzU in the representation (2.30) of r n are treated equally. We remark that the 
QMR iterates x n can be easily updated from step to step. Due to the block tridiagonal 
structure of H n , this update can be implemented with only short recurrences; see [9] 
for details. Finally, we note that the quasi-minimal residual property can be used to 
derive convergence results for the QMR method; we refer the reader to [9, 6]. 

3. A coupled two-term procedure with look-ahead. In this section, we con- 
sider a different approach to constructing the Lanczos vectors. The basic idea is to 
break up the three-term recurrences in the Lanczos process into coupled two-term re- 
currences, by using — in addition to the Lanczos vectors — a suitable second set of basis 
vectors for the underlying Krylov subspaces. In § 9, we will illustrate that QMR based 
on this coupled two-term procedure has better numerical properties than the original 
implementation of QMR based on three-term recurrences. 

3.1. The general setting. In the following, let {u J }" =1 and {tu_,}” =1 always de- 
note the sequence of vectors generated by the look-ahead Lanczos algorithm, as de- 
scribed in § 2.2. We assume that we are also given a second set of basis vectors 


(3.1) 
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for the Krylov subspaces K n (v x ,A) and K n (w 1 ,A T ). More precisely, we consider vec- 
tors (3.1) that — in analogy to (2.24) — are of the form 

(3.2) Pj=*l>j- i(^K and q i = 

where 7 ^ > 0 is given by (2.16), and V’j-i € P j~i is of exact degree j — 1 with the 
same leading coefficient as the polynomial (j>-_ x in (2.24). To distinguish between the 
two bases, we will often refer to the Lanczos vectors and as the V-W 

sequence and to the vectors (3.1) as the P-Q sequence. 

From (2.24) and (3.2), we conclude that, for each n = 1,2, • • • , 

n— 1 

(3.3) P n V n ^ ^ 

»=1 

n— 1 

(3.4) q n = w n - 2 9i u in(7n/7,), 

*=1 

with suitable coefficients u, n € C, i = 1 , 2 , ...,n — 1. Similarly, in view of (2.23), 
(2.24), ( 3 . 2 ), and (2.16), we have 

n 

(3.5) U n+1 = Ap n — ^ vj in , P n+ 1 = ll^n+lllj U n +1 = ^n+l/ Pn+ 1» 

1=1 

n 

(3.6) w n+1 = A T q n - 5Z«'i?in(7n/7<), £n+l = ll»«+llli W n+l = ^n+l/^n+ 1 . 

t=l 

with suitable coefficients l in € C, i — 1,2, ...,n. Note that (3.3)-(3.6) are coupled 
recurrences for generating the P-Q and V-W sequences: first, p n and q n are computed 
by means of (3.3)-(3.4), and then, the next Lanczos pair v n+1 and u > n+1 is obtained 
from (3.5)-(3.6). Of course, it remains to specify the actual choice of the P-Q sequence. 
In order to minimize work and storage of previous vectors, the goal here is to select 
these vectors such that the recurrences (3.3)-(3.6) are as short as possible. 

It will be convenient to use — in addition to (2.8) — the notation 

P n --[Pi Vi Pn] a 11 * 1 Qn : =[9i <h *** 9nl* 

The recurrences (3.3)-(3.6) for the vectors {pj}j =1 , {<Zj}" = u an d can 

then be written compactly in matrix form: 

(3.7) K = P.V„, AP » = 

(3.8) w„ = Q„r;'u n r„, a t q, = w n+ 1 r;|,i n r n . 

Here, U n is an upper triangular matrix and L n is an upper Hessenberg matrix given by 


'1 “12 ••• u ln 


^11 hi hn 


Pi hi : 

01 **. : 

and L n := 

0 p 3 : 

i «n-l,n 


: ; 

nn 

.9 • • • 0 p n+l . 

H 

O 

O 
1 



(3.9) U n := 
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respectively, and T n is the diagonal matrix defined in (2.16). Note that, by eliminating 
P n in (3.7), we obtain 

(3.10) AV n = V n+1 L n U n . 

By comparing (3.10) with (2.14), it follows that 

(3.11) H n = L n U n , 

i.e., the matrices (3.9) define a factorization of the block tridiagonal Hessenberg matrix 
H n generated by the look-ahead Lanczos algorithm. 

Recall from (2.7) and (2.18) that the Lanczos vectors are block biorthogonal. These 
biorthogonality relations determine the coefficients l in in (3.5)-(3.6). For example, 
consider the case that u n+1 and w n+1 are constructed as regular vectors. Then, in view 
of (2.22), we have the condition Wjv n+1 = 0, which, by (3.5), is equivalent to: 

(3.12) 0 = 

1 = 1 

Using (2.7), (2.8), and the first equation in (3.8), we deduce from (3.12) that 

(3.13) 

Recall from (2.18), (2.16), and (3.9) that the matrices D~ l , T n , and U* are block diago- 
nal, diagonal, and lower triangular, respectively. Hence the relation (3.13) implies that 
the vector Q*Ap n determines the length of the recurrences (3.5)-(3.6). In particular, 
in order to obtain recursions that are as short as possible, the P-Q sequence should be 
chosen such that the vector Q„Ap n has as many leading zeros as possible. The same 
conclusion also holds for the case that and are constructed as inner vectors. 

Motivated by this discussion, we require that the vectors in the P-Q sequence be 
A-biorthogonal or block A-biorthogonal, in the sense that the matrix 

(3.14) E n := Q T n AP n 

should be diagonal or block diagonal. Note that the vector Q^Ap n is just the nth 
column of E n . Furthermore, we remark that q^AP ny the nth row of E ny has the same 
zero structure as Q^Ap n . This is a consequence of relation (3.16) in the following 
lemma, which we will also need later on. 

Lemma 3.1. Let {u,}? =1 , {u\}-U and {pj? =1 , {?,}?= I be vectors satisfying (2.24) 
and (3.2), respectively, and let T n = diag( 7 x , 7 2 , • • • , 7 n )- A» an ^ be the matrices 
given by (2.7) and (3.14), respectively. Then : 

D.r. = (Djy, 
e, r. = (E n rj T . 


‘In 


= z? _1 r u t t~ 1 q t Ap 

1 n'-'n A n 


i. 


(3.15) 

(3.16) 
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Proof. Using the polynomial representation (2.24) for the V-W vectors and the 
fact that polynomials in a matrix commute, we obtain 

vfvj'tj = = 7 J -r^.i(A)^ 1 (A)ui7, = wjv^, 


and thus (3.15). The relation (3.16) follows similarly. □ 

3.2. The coupled algorithm without look-ahead. Next, we briefly consider 
the case that the V-W sequence consists of the vectors generated by the classical Lanc- 
zos algorithm without look-ahead. Recall from (2.13) that here the matrix D n in (2.7) 
is diagonal, and that the Lanczos vectors are biorthogonal: 


(3.17) 


T 

W j V n 


f 0 if j ^ n, 

U n 7 ^ 0 if j = n. 


Suppose that it is possible to construct the vectors in the P-Q sequence such that the 
matrix E n in (3.14) is also diagonal: 


(3.18) E n = diag(c 1? e 2 , • • • , ej, where e,- := qjApj ± 0. 


By (3.14) and (3.18), the P-Q vectors are then A-biorthogonal: 


(3.19) 



if j ^ n, 
if j = n. 


With (2.13), (3.9), and (3.19), we deduce from (3.13) that 


(3.20) 


{,« = 0, * = 1,2, •••,»- 1, and l nn = := tJ6 n . 


Furthermore, by multiplying (3.3) from the left by qj A and by using (3.19), (3.6), 
and (3.17), we obtain that, for j = 1, 2, • • • , n — 1, 


(3.21) 


0 = qjAp n = qjAv n - £jUj n 

= {A T qj) T v n - e jUjn 
= (j+iwj+iV n - e jUjn . 


With (3.17), it follows from (3.21) that 

(3.22) u in = 0, i = 1,2, ••-,«- 2, and u n _ 1>n = . 

In view of (3.20) and (3.22), all but the last terms vanish in each of the sums in (3.3)- 
(3.6), and hence (3.3)-(3.6) reduces to a coupled two-term procedure for generating the 
P-Q and V-W sequences. 

The nth iteration of the the resulting algorithm can be summarized as follows. 
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ALGORITHM 3.2. (nth iteration of the coupled algorithm without look-ahead) 

1) If e n _! = 0, then stop. 

Otherwise, compute 8 n = w*v n . 

If 8 n = 0, then stop. 

2) Compute 

Pn ~ Pn-1 (^n^n/ e n— l)» 

9n = W n ~ 

3) Compute e n = q*Ap n> 0 n = t n /8 n , and set 

f’n+l Ap n Pn+l ll^n+lll’ 

u> n +i = A T q n - w n 0 n , £ n+1 = |K + ill- 

4) If p n + 1 = 0 or £ n+1 = 0, then stop. 

Otherwise, set 


v n+l — ^n+l/ Pn+li W n + 1 — “*n+l/£n+l* 

We remark that the vectors in the P-Q and V-W sequences are, up to scaling, just 
the search directions and the residual vectors generated by the BCG method [14, 4]. 
In particular, Algorithm 3.2 can be viewed as the nth iteration of a rescaled version of 
BCG, where the computation of the BCG iterates is omitted. 

In finite arithmetic, one of the termination checks in steps 1) or 4) of the coupled 
two-term procedure will be satisfied after at most N iterations. Normally, the algorithm 
stops due to p n+1 * 0 or £ n+1 = 0, and then the procedure has constructed a basis for 
the invariant subspaces K n (v l ,A) or K n (w 1 , A T ), respectively. This is called regular 
termination. If e n _ x = 0 or 8 n = 0 occurs, then the algorithm has to be stopped to 
avoid division by 0. This is referred to as an exact breakdown. Recall from § 2.2 that 
£ n = 0 signals a breakdown in the classical Lanczos algorithm. Note that, like BCG, the 
coupled two-term procedure now has a second source of breakdown, namely the case 
e n _ 1 = 0. It can be shown that the condition e n _ x = 0 corresponds to a breakdown in 
the BCG algorithm due to an nth iterate not being defined by the Galerkin condition. 

In finite precision arithmetic, exact breakdowns are rather unlikely. However, near 
breakdowns, where 8 n or c n _ x is nonzero, but small in some sense, may occur, leading 
to numerical instabilities in subsequent iterations. Next, we sketch a coupled two-term 
procedure that uses look-ahead in the construction of both the V-W and P-Q sequences 
to avoid exact and near-breakdowns. 

3.3. The general algorithm with look-ahead. For describing the look-ahead 
in the V-W sequence, we will use the notations (2.18)-(2.21) introduced in §2.2. In 
particular, the integer / := l(n) denotes the number of look-ahead steps that have been 
performed during the construction of the first n vectors {t>,}” = i and {tuj" =1 in the 
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V-W sequence, and the n-'s in (2.21) are the indices of the regular vectors. Recall that, 
by (2.7) and (2.18), the V-W vectors satisfy the block biorthogonality conditions 

(3.23) (HrtOfi*) = { ^ jj i.j = 1,2, 

Here, the blocks are all nonsingular, except for possibly However, we have 

that necessarily 

(3.24) is nonsingular, if n /+1 = n + 1. 

Next, we introduce similar notations for describing the look-ahead in the P-Q 
sequence. We denote by k := k(n) the number of look-ahead steps that have been 
performed during the construction of the first n — 1 vectors and {g,-}JsTj in the 

P-Q sequence. In analogy to (2.19), we partition these vectors into blocks, according 
to the look-ahead steps taken: 

(3.25) F n _x = [P (1) P (2) ••• P (k) ] and Q n _ x = [Q (1) Q < 2 > ••• 

Recall from (3.14) that the P-Q vectors are constructed to be block A-biorthogonal. 
More precisely, we have 

E„_ 1 = diag(E< I >,.E<V',£ ( * ) ), 

or, equivalently, 

(3.26) {QMfAPM - {^, f { i,j~ 1,2, •••,*. 

Here, the blocks are nonsingular, except for possibly the last block E^ k \ In analogy 
to (2.21), we denote by m- the indices of the first vectors of the blocks and 
in (3.25). Hence, for j = 1, 2, • • • , fc, we have 

(3.27) P U> = [j> m , p mi+I ■■■ ] and = [? m , ••• ]. 

Furthermore, the indices rrij satisfy 

(3.28) 1 =: m x < m 2 < • • • < m k < n < m k+1 . 

We remark that, by (3.26) and (3.27), the vectors p m . and q m . are A-biorthogonal to 
all previous P-Q vectors, i.e., 

q[Ap m) = q^Api = 0 for aU t = l,2,---,m i - 1. 

Therefore, using the same notation as for the V-W sequence, we refer to the vectors 
p m and q m as regular vectors, while the remaining vectors in (3.27) are called inner. 
Finally, it turns out that, in (3.26), the last block E w has to be nonsingular, if p n and 
q n are constructed as regular vectors. This means that, in analogy to (3.24), 

(3.29) E (k) is nonsingular, if m fc+1 = n. 
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After these preliminaries, we can now sketch the actual algorithm. Let n > 1, and 
assume that we have already generated the first n — 1 vectors {p,}?^ 1 an( i {9i}?=i °f 
the P-Q sequence, and the first n vectors and {to,}"-! of the V-W sequence. 

First, the next pair of P-Q vectors, p n and q n , is constructed, using the recur- 
rences (3.3)-(3.4). Here, the coefficients u in need to be chosen such that p n satisfies 
the corresponding A- biorthogonality conditions (3.26). We note that, in view of (3.16), 
the A- biorthogonality relations for the vector q n are then also fulfilled. With (3.6) 
and (3.23), one readily verifies that 

qj Av n = {A T q i ) T v n = 0 for all i = 1, 2, • • • ,n, - 2. 

By rewriting this in terms of the blocks we obtain that 

(3.30) (Q^) T Av n = 0 for all j - 1,2, • • • , k* - 1, 
where k* := k*(n) is given by 

(3.31) k* = max {j | 1 < j <k and m- < max{l,n, — 1}} . 

The orthogonality conditions (3.30), together with (3.26), imply that, in (3.3), we have 
u in = 0 for all i < m k +. Thus the recurrences (3.3)-(3.4) reduce to 

n— 1 

(3.32) Pn= V n~ E P. U <n. 

n— 1 

(3.33) = *U n ^ 1 9i^tn(7n/Tt)' 

t=m k * 

Now we need to determine the coefficients u in for m fc * < i < n — 1. This is done by 
enforcing the remaining A- biorthogonality conditions (3.26) for p n and the vectors 
namely 

(3.34) qj Ap n = 0 for all * = + 1, • • • ,m*. 

Here, m* := n — 1 if p n and q n are constructed as regular vectors, and m* := m k — 1 
otherwise. Note that, in view of (3.28), we always have m k — 1 < m* . First, we 
consider (3.34) for the indices i in the range m k * < * < m k — 1. Using (3.32) and (3.26), 
we deduce from (3.34) that 

(3.35) U mt = (diag( £<*•) ••• £<*-*» ))" ••• Q^f A .... 

Here and in the sequel, we use the notation 

T 

:j,n * [ ^»n ^jn ] 

for vectors consisting of successive elements of the nth column of the matrix M = [ m- ] T 
and similarly, we denote by M n ii:j row vectors consisting of successive entries of the nth 
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row of M. If p n and q n are constructed as regular vectors, then we also have to ensure 
that (3.34) holds for i with m k < i < n — 1, and this gives 

(3.36) 

We remark that, by (3.29), the matrix E ^ in (3.36) is necessarily nonsingular since 
the vectors p n and q n are regular. If p n and q n are constructed as inner vectors, then 
m* = m k — 1, and (3.34) yields no conditions for the choice of the coefficients u in with 
m k < i < n — 1. In this case, we set 

(3.37) u in = Q n for i = m k ,m k + 1, • • • ,n — 1, 

where ( in 6 C can be chosen arbitrarily. Finally, if p n and q n are regular, we update 
the “regular” indices (3.28) by setting m fc+1 := n and k := k + 1. This completes the 
construction of the p n and q n vectors. 

In a second step, we now compute the next pair of V-W vectors, v n+1 and u> n+1 , 
using the recurrences (3.5)-(3.6) Here, we have to determine the coefficients Z in such 
that u n+1 satisfies the corresponding biorthogonality conditions (3.23). Note that, in 
view of (3.15), the relations (3.23) for u» n+1 axe then fulfilled automatically. We proceed 
similar to the construction of p n and q n . By using (3.26) and the fact that the columns 
of the matrices Q, and W- span the same space, it is easily verified that 

(3.38) {W (i) ) T Ap n = 0 for all j = 1,2,- . • ,T - 1, 
where Z* := Z*(n) is given by 

(3.39) F = max {j 1 | 1 < j < Z and rij < . 

From (3.38), we conclude that the recurrences (3.5)-(3.6) reduce as follows: 

(3-40) v n+1 = Ap n — v ihn, 

(3-41) W n+1 = A T q n - jr 

isn|* 

The recurrence coefficients l in in (3.40)-(3.41) are determined by enforcing the remain- 
ing biorthogonality conditions (3.23) for the vectors u n+1 and to,, < t < n. This 
gives 

(3.42) = (diag( DC*) ••• D<'-'> ))'’ W<‘->] T Ap n . 

Moreover, if u n+1 and tu n+1 axe constructed as regular vectors, then 

(3.43) = (D‘ , >)- 1 (W' ( ' , ) T Ap n . 
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Note that, by (3.24), the matrix D (,) in (3.43) is necessarily nonsingular since u n+1 and 
u; n+1 are regular. If u n+1 and tu n+1 are built as inner vectors, then we set 

(3.44) l in = Vin for * = n h n i + 1> ‘ ‘ 1 . n t 

where rj in € C can be chosen arbitrarily. Finally, if u n+1 and w n+l are regular, then we 
update the indices (2.21) by setting n l+1 := n + 1 and / := / + 1. 

The resulting coupled procedure for generating the P-Q and V-W sequences can 
be sketched as follows. 

ALGORITHM 3.3. (Coupled algorithm with look-ahead) 

0) Choose v l , w-y € with || Vj || = ||t^i|| = 1- 

Set VM = V y, WM = W y, DW = W Jvy. 

Set k = 1, m k = 1, / = 1, 7i ( = 1. 

For n = 1,2, • • • , do: 

1) Determine k* from (3.31). 

2) Decide whether to construct p n and q n as regular or inner vectors 
and go to 3) or 4), respectively. 

3) Compute p n and q n by means of (3.35)-(3.36) and (3.32)-(3.33). 

Set m k+1 = n, k = k + 1, = Q^ k) = 0 and go to 5). 

4) Compute p n and q n by means of (3.35), (3.37), and (3.32)-{3.33). 

5) Set 

p(‘> = [p(‘) pj, g<‘> = [« (fc) «J, £<‘> = 

6) Determine l* from (3.39). 

7) Decide whether to construct u n+1 and w n+1 as regular or inner vectors 
and go to 8) or 9), respectively. 

8) Compute u n+1 and u> n+1 by means of (3.42) -(3.43) and (3.40) -(3.41). 

Set n l+1 = n + 1, / = /+ 1, V (,) = = 0 and go to 10). 

9) Compute u n+1 and tu n+1 by means of (3.42), (3.44), and (3.40)-(3.41). 

10) Compute p n+1 = ||u n+1 || and f n+1 = ||u> n+1 ||. 

If Pn+i = 0 or £ n+1 = 0, then stop. 

Otherwise, set 

u n+l = ^n+l/ Pn+\1 ^n+l = ^n+l/^n+H 

y(0 = [y(0 Vn+1 ], W(') = [WW tu n+1 ] , £>(') = (W(0)*V(0. 

We remark that Algorithm 3.3 reduces to the coupled two-term procedure described 
in §3.2 if all vectors in the P-Q and V-W sequences are built as regular vectors. Note 
that in this case, we have k{n ) = n — 1, = n 1, l{ n ) — n t and fi/* n for all n. 
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4. The look-ahead strategy. As described in §3.2, there are two possible break- 
downs in the coupled two-term procedure without look-ahead: one associated with the 
V-W sequence, and another associated with the P-Q sequence. In particular, Algo- 
rithm 3.2 will encounter an exact breakdown in the V-W sequence if w^v n = 0, or in 
the P-Q sequence if = 0. The exact breakdowns of the two sequences are 

not independent of each other, as was pointed out by Gutknecht in [11]. For a full 
description of the structure and coupling of the exact breakdowns, we refer the reader 
to [11, 2] and the references given there. However, in practice one is also concerned with 
avoiding near-breakdowns, that is, situations when w„v n or are not exactly 

zero, but are small in some sense. 

In the coupled procedure with look-ahead, which we sketched in § 3.3, exact and 
near- breakdowns in the P-Q respectively V-W sequence are prevented by building the 
next pair of vectors p n and q n respectively u n+1 and u? n+1 as inner vectors. In this 
section, we describe the look-ahead strategy that is used to decide in steps 2) and 7) of 
Algorithm 3.3 whether vectors are constructed as regular or inner vectors. 

Recall from § 2.2 that the vectors and in the look-ahead Lanczos 

algorithm satisfy a block three-term recurrence that can be written compactly as (2.14)- 
(2.15). By eliminating V n and W n in (3.7) and (3.8), one obtains a similar recurrence 
relation for the vectors {pj}” =1 and {<Z,}" =1 of the P-Q sequence: 

(4.1) AP„_, = P.t/.I.., and A T Q n _ l =QJ;'U„l„_,T r ._,. 

By using the A-biorthogonality of p n and q n , it is easy to show that the recurrences for 
the P-Q sequence are also three-term recurrences, or, in the general case, block three- 
term recurrences. We note that, in (4.1), the matrix U n L n _ 1 of recurrence coefficients 
is obtained by multiplying the factors L n _ 1 and U n from the decompositions (3.11) of 
H n _ 1 and H n , respectively, in reverse order. This was first remarked by Rutishauser [16] 
for the special case of no look-ahead, and recently by Gutknecht [11], for the general 
case. 

The look-ahead strategy consists of monitoring breakdowns in the two sequences 
independently. For the V-W sequence, the criteria used are the same as those proposed 


in [7]: 


(4.2) 


(4.3) 

»(A)> 

1 

(4.4) 

* ft 


where tps is machine epsilon, and u(A) is an estimate for the norm of A. The Lanczos 
vectors u n+1 and u? n+1 are built as regular vectors only if all three of the above checks 
are satisfied. The check (4.2) ensures that the diagonal blocks are nonsingular, 
while the checks (4.3)-(4.4) ensure that the size of the coefficients p n and u n in (2.23) 
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and in the corresponding relation for tZ> n+1 do not exceed an estimate n(A) for the norm 
of A. The first condition is needed since the inverse of appears in p n , while the 
second and third conditions attempt to ensure that the components from K n ( Tq , A ) 
and from K n {wi, A^) do not dominate the Av n and A^w n terms, respectively. Another 
motivation for these checks is as follows. The symmetric Lanczos process for Hermitian 
matrices A generates tridiagonal matrices H n that satisfy 


(4.5) ||tf nil < \\ A \\ for al1 n - 

For the classical nonsymmetric Lanczos algorithm, the relation (4.5) does not hold in 
general. Indeed, formally, we have ||HJ = oo if a breakdown occurs. As David Day 
pointed out to us 1 , an “ideal” look-ahead Lanczos procedure would ensure that (4.5) 
also holds for non-Hermitian matrices. The criteria (4.3)-(4.4) can be viewed as a 
cheap way of modeling the conditions (4.5). We remark that the checks (4.3)-(4.4) take 
advantage of the normalization (2.9) of the Lanczos vectors. 

For the P-Q sequence, the criteria are similar: the diagonal blocks £ (j) must be 
nonsingular, and the size of the last columns of U n L n _ 1 and of r n 1 I/ n L n _ i r n _ 1 must not 
exceed the estimate n(A) for the norm of A. Singularity of £ (fc) is once again checked 
from its smallest singular value: 


^ e P S ‘ 


However, for the second and third checks, it is no longer sufficient to compute just the 
norm of the last column of the matrices of recurrence coefficients, as the vectors p n and 
q n are not normalized to unit length. Instead, one must check 


(4.6) 




ill’ 


and 


(4.7) 


t M 


lk-ll- 


This means that the look-ahead strategy for the P-Q sequence requires the computation 
of the two norms ||p n || and || 9n || at each step, work that would otherwise not be needed 
by the algorithm. Once again, the vectors p n and q n are built as regular vectors only 
if all three of the above checks are satisfied. We remark that the look-ahead strategy 
presented here builds regular vectors in preference to inner vectors, and will therefore 
build as few inner vectors as possible. 

Finally, we note that other look-ahead strategies are also possible. For example, 
Gutknecht [11] proposed a look-ahead strategy that assumes that the near- breakdowns 
encountered in the two sequences have the same structure as the exact breakdowns. 
We have chosen to monitor the two sequences independently; nevertheless, our strategy 
will recover the exact-breakdown structure if only exact breakdowns are considered. 


1 Private communication, Berkeley, March 1992. 
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5. Implementation details. In this section, we present a detailed description of 
our implementation of the coupled two-term recurrence Lanczos algorithm with look- 
ahead. We are interested in obtaining an implementation that requires only two inner 
products per iteration to compute all the coefficients of the recurrence formulas. Re- 
call that the look-ahead strategy (4.6)-(4.7) for the P-Q sequence and the normaliza- 
tion (2.9) require a total of four norm computations, so that the implementation will 
require two inner products and four norms per iteration. 

Let us denote by 


(5.1) 


G„-i 


the matrix of recurrence coefficients for the three-term recurrences for P n and Q n 
in (4.1). Let us also introduce the auxiliary matrices 

F„-.= WjAP n and F, := Q^AV„, 


whose columns are needed in (3.35)-(3.36) and (3.42)-(3.43). It turns out that these 
two matrices are essentially the transpose of each other. 

LEMMA 5.1. The matrices F n and F n satisfy: 

f , r n = (F„r n ) T . 


Proof. The proof is similar to the proof of Lemma 3.1. □ 


The nth iteration of the implementation will update the matrices D n _ t , E n _ 1 , F n _ 1 , 


^n-H ^n-l> Qn-V ^n> an< ^ ^n> ^ n> ^n> ^n) ^n> 


Qni Kj+ 1» an< ^ ^n+1) 


respectively. 

We first list an outline of the algorithm as we have implemented it. This is es- 
sentially the same a s Algorithm 3.3, up to the order of the steps and more details. 


ALGORITHM 5.2. (Coupled algorithm with look-ahead) 

0) Choose v 1) w 1 € with ||v 1 || = |j || = 1, and compute wfv 1 . 
Set k = 1, m k = 1, / = 1, n, = 1. 

For n = 1,2, •••, do: 

1) Update D n _ 1 to D n . 

2) Determine k* from (3.31): 

k* = max I 1 < j < k and m- < max{l,nj — 1}| . 

3) Compute F n l!B _j from (5.2) below, rising L n _ x and D n l:n . 

Then compute by Lemma 5.1. 

4) Check whether is nonsingular: 

innerp = <w(£ ( *>) < eps. 
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5) Compute the part o/U- i . nn that is determined by biorthogonality: 


//innerp, to 6). Otherwise , set 

U mi „- 

6) Build the part of p n and q n that is common to both regular and 
inner vectors: 


mfc-1 

Pn= V n~ S P« U «n> 

t=m fc * 


mfc — 1 

?n = U, n- H 9, U m(7„/7,)- 

i=m t * 


7 /innerp, 50 to 11). 

7 ) Build and check the coefficient G mk • // innerp, po to 11). 

8) 7?ut/d p n and g n as regular vectors : 

n— 1 

Pn = Pn ^ Pt U tn» 

t=mfc 

n— 1 

9n = 9n ~ H ?i«m(7n/7i)- 
t=m k 

Compute Ap n) q„Ap n , ||p B ||, and ||g n ||. 

9 ) Build and check the coefficient G mk . n _ ln . innerp, <70 to 11). 

10 ) Set m k+1 = n, k = k + 1, and go to 12 ). 

11 ) Choose the inner recurrence coefficients u in , i = m fc ,- • • ,n — 1 , and 
6uz7d p n and g n as inner vectors : 

n-X 

Pn = Pn - £ P. u .n) 

t=tn fc 
n— 1 

9n = 9n ^ 7i U in(7n/ 7t)‘ 

«=m k 

Compute Ap n , q%Ap n) ||p n ||, and ||g B ||. 

12) // ||p n || = 0, or ||g n || = 0, then stop. 

13 ) Compute A T q n . 

14 ) Update E n _ x to E n . 
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15) Determine l* from (3.39): 

1* — max I 1 < j < l and n j < . 

16) Compute F Vnn from (5.3) below, using E n and U n . 

17) Check whether D ^ is nonsingular: 

innerv = < eps. 


18) Compute the part of L 1:nn that is determined by biorthogonality: 

If innerv, go to 19). Otherwise, set 

= (DM)-'(wm) T A P „ = (£»«'>)-> r w . 

19) Build the part o/u n+1 and iw n+1 that is common to both regular and 
inner vectors: 


n ( — 1 

Vfl = A Pn - J2 V i l in> 
t=nj* 

n t -l 

™n+l = ^ T ?n “ £ Wiknilnhi)- 

i =nj* 


If innerv, go to 24). 

20) Build and check the coefficient H n{ . n n . If innerv, go to 24). 

21) Build u n+1 and to n+1 as regular vectors: 

n 

t> n+ l = V n+1 - £ Vmi 

i=nt 

n 

™n+l = ™»+l “ H W i l inhnhi)‘ 
t=nt 

Compute p n+1 = / n+1 , n = ||5 n+1 ||, £ n+ i = ||to n+1 ||. 

#> n+ i = 0 or £ n+1 = 0, tfien stop. 

Otherwise, set 7 n+1 = 7„/> n +i/£n+i> an <* compute w% +l v n+l . 

22) Build and check the coefficient H nrnn+1 . If innerv, go to 24). 

23) Set n <+1 = n + 1, / = / + 1, and go to 25). 

24) Choose the inner recurrence coefficients l in , i = n h • • • ,n, and 
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build u n+1 and tu n+1 as inner vectors : 

n 

v n+ l = V n+1 - 53 V i l tm 

i=n t 

n 

w n+1 = W n+1 - 53 W ihni'1 n hi)- 
i=n f 

Compute p n+1 = / n+1 , n = ||5 n+1 ||, £»+i = IK+ill- 
If p n + i = 0 or £ n+1 = 0, then stop. 

Otherwise, set 7 n+1 = 7 n /Wi/£n+i , an<i compute w% +1 v n+1 . 
25) 5e< 

u n+l = Vn+l/ Pn+1 > ^n+l = ^n+l/^n+U 
W Z+l V n+l = ™n+lVn+l/(Pn+ltn+l)- 


Step 1. The diagonal term has already been computed directly, at the end 

of the previous step. Next, using 


(5.2) 


r„-i = wt,AP„-, = 

I^n— 1 1'l-.n— l,l:n — 1 "b ^n,n — 1 ^l:n— l,n 


0 1], 


the remainder of the last column of D n is computed from D n _ t , F n _ 1 , and L n _ 1 . The 
last row of D n is obtained by symmetry, using (3.15) from Lemma 3.1. 

Step 7. We build which would be the coefficient of the P W and Q ^ 

blocks in the three-term recurrences for p n and q n . Using (5.1), one has: 


G i,n-1 = £ U «An-l> * = m 4l • • • , n - 1. 
i=* 

We then check (4.6)-(4.7), and set innerp to TRUE if at least one of the two checks fails. 

Step 9. We build G m|t: „_i in) which would be the coefficient of the and 
blocks in the three-term recurrences for p n+l and g n+1 . It is straightforward to show 
that 


r T 7 »fc:n— l,n 


(£<*))-i(<3 (fc) ) r ,4Ap n . 


Moreover, we have 

Ql-^Ap„ = [a t q„_^ t Ap„ = (<3„r; 1 c/„£„_ I r n _ 1 ) r Ap, 

= r„.,£L I ^r; 1 ojAp„ 

= r„. 1 £L 1 £/n T r; , [o ••• « il A v,\ T 
= — ••• 0 qZA Pn ) T . 

in 
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We then check (4.6)-(4.7), and set innerp to TRUE if at least one of the two checks fails. 

Step 14. The diagonal term q„Ap n has already been computed directly, as part of 
Step 8) or Step 11). Next, using 

(5.3) F, = wj AP n = r n [/ n T r;'o^f n , 

the remainder of the last row of E n is computed from E n _ 1? E 1 . n<l . n _ 1 , and U n . The last 
column of E n is obtained by symmetry, using (3.16) from Lemma 3.1. 

Step 20. We build H nt:nn , which would be the coefficient of the and W® 
blocks in the three-term recurrences for u n+1 and u> n+1 . Using (3.11), one has: 


= * = n„ •••,«. 

}-• 

We then check (4.3)-(4.4), and set innerv to TRUE if at least one of the two checks fails. 

Step 22. We build /f n| . n n+1 , which would be the coefficient of the and 
blocks in the three-term recurrences for v n+2 and to n+2 . It is straightforward to show 
that 


H. 


n\ :n,n+l 




Moreover, we have 

w^av^= (a t w„) t Av„„ = (w , n+I r;| 1 i„t/„r„) T v „ +1 

V n+1 

= TJJlLlKU [0 •••0 toJ fl v„ +1 ] T 

= 7^-L+l,»[0 ••• 0 W »+l V n+l 1 T - 

7n+l 

We then check (4.3)-(4.4), and set innerv to TRUE if at least one of the two checks fails. 

We remark that the checks in steps 7) and 9), and 20) and 22), are actually slightly 
relaxed versions of (4.3)-(4.4), and (4.6)-(4.7), respectively, since the indices checked 
are only a subset of the full range appearing in (4.3)-(4.4) and (4.6)-(4.7). We also 
note that the algorithm above requires minimal inputs from the user. Recall that eps 
in steps 4) and 17) is machine epsilon. Furthermore, the estimate n{A) for the norm of 
the matrix can be updated dynamically, as was done in [7]. 

The coupled Lanczos Algorithm 5.2 requires per iteration the computation of two 
inner products and four vector norms. We conclude this section by noting that, in Al- 
gorithm 5.2, the choice of the inner recurrence coefficients (3.37) and (3.44) is arbitrary. 
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In our implementation of the algorithm, we have used 


'n— l,n 
'n— 2,n 1 ’ 

when 

m k < n — 2, 

“in = °> 

for i 

= m fc , •••,«- 3, 

L n=l, 

U.»=i, 

when 

n, < n — 1, 

hn = 0, 

for i 

II 

3 

1 


for the inner vector recurrence coefficients. 


6. An implementation of QMR with look-ahead. We now return to linear 
systems (1.1) and the QMR method. In this section, we propose an implementation of 
QMR method based on the coupled two-term look-ahead Algorithm 3.3. 

Recall that the nth QMR iterate x n is defined by (2.25)-(2.26) in terms of the 
matrices V n and H n generated by the look-ahead Lanczos algorithm. In the original 
implementation of QMR, the solution z n of the least squares problem (2.26) is computed 
by means of a QR decomposition of the matrix Cl n+l H n . 

Here we consider the case that the Lanczos vectors are constructed using the coupled 
two-term Algorithm 3.3. Recall that Algorithm 3.3 yields as a by-product the factors 
L n and U n in the decomposition (3.11) of H n . Using the factorization (3.11) and setting 
y n := U n z n , we can rewrite the definition (2.25)-(2.26) of z n as follows: 

( 6 . 1 ) I b = *0 Vm 


where y n is the unique solution of the least squares problem 

(6.2) ||/ n +i - n n+ iL n y n \\ = min ||/ n+1 - ft n+1 L„y||. 

y € 

Here, as before, / n+1 is given by (2.27) and fl n+1 is defined in (2.28). We remark that 
the least squares problem (6.2) is actually cheaper to solve than the original one (2.26). 
The reason is that the matrix L n in (6.2) has fewer nonzero elements than H n in (2.26). 
For example, if no look-ahead steps are taken, then H n is tridiagonal, while L n is a 
lower bidiagonal matrix. This special case will be considered in more detail in § 7. 

As discussed in [9], solutions of least squares problems of the type (6.2) can be 
easily updated from step to step, using the QR decomposition of fl n+1 L n , 

(6.3) ^n+l-^n = Qn 



where Q n is a unitary (n + 1) x (n + 1) matrix, and R n is a nonsingular upper triangular 
n x n matrix. With this, the least squares problem (6.2) becomes 


min ||/ n+1 - n n+1 L n y|| 

y € c 




/n+1 


Rn 

0 




R 
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This gives for y n : 


(6.4) 


Vn = (R n) wtere 



• Qnfn+1' 


Finally, we note that it is possible to update the QMR iterate at each step, as 
was done in the original QMR algorithm. Full implementation details were given in [9, 
Sect. 4], and we will not repeat them here. The point is that the QMR iterates have 
an update formula of the form 


(6.5) x n = x n . x + d n r n . 

Here, r n is given by (6.4), and d n is an auxiliary search direction defined as the last col- 
umn of the matrix V n U~ l R~ l . The vectors d n are also updated with short recurrences: 
the recurrence for d n involves only as many vectors as the recurrence for v n+1 . For full 
details of the update procedure for d n , we refer the reader to [9]. 

The basic outline of the resulting implementation of QMR based on the coupled 
two-term Algorithm 3.3 is then as follows. 

ALGORITHM 6.1. (QMR based on coupled recurrences) 

0) Choose x 0 € and set r 0 = b — Ax 0 , p 1 = ||r 0 ||, v x = r 0 /p 1 . 

Choose w x € C N with ||u>j|| = 1. 

For n = 1,2,---, do: 

1) Perform the nth iteration of the coupled two-term Algorithm 3.3. 

This yields matrices L n , P n , U n) and V n+l> which satisfy (3.7). 

2) Update the QR factorization (6.3) of £l n+1 L n and the vector t n in (6.4). 

3) Update the QMR iterate x n by means of (6.5). 

4) If x n has converged, then stop. 

In [9], various properties of the QMR method are given. For example, it is shown how 
existing BCG iterates can be easily recovered from the QMR process, and how estimates 
for the QMR residual norms can be obtained at no extra costs. We would like to stress 
that these properties also hold true for the particular implementation of QMR sketched 
in Algorithm 6.1. 

7. An implementation of QMR without look-ahead. In this section, we 
present the simplification of Algorithm 6.1 to the case where no look-ahead is used. 
We also briefly address the issue of preconditioning. 

Let M be a given nonsingular N x N matrix that approximates in some sense the 
coefficient matrix A of (1.1). Suppose further that A/ is decomposed as 

(7.1) M = M 1 M 2 . 

Then, one applies the QMR algorithm to the system 

(7.2) A!y' = b\ 
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where A! — AffMMf 1 , b' = Mj *6, and x' — M 2 x. It is easy to see that the linear 
systems (1.1) and (7.2) are equivalent, and that one can transform back from the iterates 
x' n and the residuals r' n of the system (7.1) to the iterates x n and the residuals r n of the 
system (1.1). For example, while applying QMR to the preconditioned system (7.2), it 
is possible to write the resulting algorithm in terms of the quantities corresponding to 
the original system (1.1); this is what is done below. 

We now present a version of the QMR algorithm based on the coupled Algo- 
rithm 3.2, which does not have look-ahead. We remark that in this case, by (3.22) 
and (3.20), the matrix U n in (3.9) is upper bidiagonal, and L n is a lower bidiagonal 
matrix. We also implement preconditioning, as discussed above. The resulting QMR 
algorithm is as follows. 

ALGORITHM 7.1. (QMR based on coupled recurrences without look-ahead) 

0) Choose x 0 £ C N and set r 0 = b — Ax 0 . 

Compute p 1 = ||Mf l r 0 || and set v l = r 0 /p a . 

Choose w j € C N with ||Mf T u; 1 || = 1. 

Set p 0 — <Jq = d( j, Cq — £q 1, $o 0, t)q 1. 

For n = 1, 2, • ■ • , do: 

1) If t n _ i = 0, then stop. 

Compute 8 n = wf i M~ l v n . 

If 8 n = 0, then stop. 

2) Compute 

p n = M~ l V n - P„_i(£A/ C n-l)> 

<?n= M~ T W n - qn-iipjjtn-i). 

3) Compute t n = qlAp n) 0 n = e n /8 n , and set 

= A Pn ~ *\A> Pn + 1 = 
ri n+1 = A T q n - w n 0 n , t n + l = \\M 2 T w n+l \\. 

4) Compute 


5) If p n+1 = 0 or £ n+1 = 0, then stop. 

Otherwise, set 

u n+l = 4+1 / Pn+11 ^n+l ^n+l/^n+l • 


Vn ~ ~Vn-l 


— ^n+lPn+l _ * 

~ Un C n-\\PS " yfl + ’ 

= P„J?„ + ^„_i(^„_ic„) 2 , x n = X n _! + d n . 


PnS 
&4-1 ’ 
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As pointed out in §3.2, the vectors computed in steps 2) and 3) of Algorithm 7.1 
are rescaled versions of vectors used in BCG. Freund and Szeto [10] have used this 
connection to derive an implementation of QMR without look-ahead that is directly 
based on BCG. Their Algorithm 3.1 in [10] and Algorithm 7.1 are mathematically 
equivalent. 

8. QMR for complex symmetric matrices. In this section, we briefly discuss 
the application of the QMR Algorithm 7.1 to the solution of complex symmetric linear 
systems, i.e., systems with 


A = A r € C N * N . 

We note that the QMR approach was originally proposed by Freund in [5] for exactly 
this class of linear systems. 

The benefit of applying a Lanczos method to complex symmetric systems is that 
the underlying Lanczos algorithm simplifies naturally. Recall that in the coupled Algo- 
rithm 3.3, the second starting vector w 1 is arbitrary. In the case of a symmetric matrix, 
if w j is chosen equal to t>j, then it is easy to show that w n = v n and q n = p n , for all n. 
Thus, the recurrences for iu n and q n can be eliminated, saving roughly half the amount 
of work and storage. 

However, we remark that — in contrast to the Lanczos algorithm for Hermitian 
matrices, where breakdowns are excluded — the Lanczos process for complex symmetric 
also requires look-ahead in order to avoid exact and near-breakdowns. This is discussed 
in detail in [5]. 

The only other issue in the case of complex symmetric systems is that the precon- 
ditioner M in (7.1) must also be symmetric, i.e., 

(8.1) M = M X M 2 = (M 1 M 2 ) t = M t . 

For example, this is always guaranteed if the decomposition (7.1) is “symmetric” in the 
sense that 

(8.2) Af 2 = M x r . 

However, we stress that the condition (8.2) is not necessary, and M x and M 2 can be ar- 
bitrary matrices satisfying (8.1). We remark that standard preconditioning techniques, 
such as incomplete factorization, yield symmetric preconditioners (8.1) when applied to 
symmetric matrices A. 

In this case, one can apply the QMR Algorithm 7.1 to the resulting preconditioned 
system. Once again writing everything in terms of the quantities corresponding to the 
original system (1.1), one obtains the following iteration. 
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ALGORITHM 8.1. (QMR without look- ahead for complex symmetric systems) 

0) Choose x 0 € C N and set r 0 = b — Ax 0 . 

Compute p l = ||Af x r 0 || and set v 1 = r 0 /p 1 . 

Set p 0 = c? 0 , Cq = e 0 = 1 , t?o = 7o = — 

For n = 1 , 2, • • • , do: 

1) If e n _ 1 = 0, then stop. 

Compute <$ n = vfM~ l v n . 

If S n = 0, then stop. 

2) Compute 

p n = M~ 1 v n -p n _ l {p n SJe n _ 1 ). 


3) Compute e n = plAp n , 0 n = eJ6 n , and 

®n+i = A Pn ~ vj n , p n+1 = \\Mf l v n+1 \ 


4) Compute 


,9 _ jWnjl „ _ 1 „ ^ 

u n — 1/31’ C n } ’ 'In 'In— l n o ’ 

^n C n-\ \&n\ yj\ + l?£ Pn°n-\ 

d n = PnPn + = X n-1 + d n‘ 


5) If p n+1 = 0, then stop. 
Otherwise, set 


V n+1 — ^n+l/ Pn+ 1* 

9. Numerical experiments. In this section, we present a few numerical exam- 
ples. We compare the original and the new implementation of the QMR algorithm, 
as well as illustrate an application of the coupled QMR algorithm to the solution of 
complex symmetric linear systems. 

In Figures 9.1-9. 3 below, we always show the true relative residual norm ||r n ||/||r 0 || 
plotted versus the iteration index n. 

Example 9.1. This example is taken from [1], and is meant to illustrate the typical 
behavior that can be expected from the new implementation of QMR when compared 
to the original implementation. We consider the differential equation 

Lu = f on (0,1) x (0,1), 



+20(x + y)~^~ + 20^ ((x + y)u) + 


1 + x + y U ' 


(9.1) 

where 


Lu := 
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Fig. 9.1. Convergence curves for Example 9.1. 

with Dirichlet boundary conditions u = 0. We discretized (9.1) using centered differ- 
ences on a 29 x 29 grid with mesh size h = 1/30. This leads to a linear system Ax = b, 
where A is a nonsymmetric matrix of order N = 900 with 4380 nonzero entries. We 
ran the original and the new implementations of QMR, both with look-ahead and with 
the same starting conditions, until the true residual norm ||r n || was not reduced any 
further. The vectors b and u> 1 were random vectors, the initial guess x 0 was zero, and 
the example was run without preconditioning. The original QMR algorithm is plotted 
with a dotted line; it stagnated at 6.7E— 14, and it built 6 look-ahead blocks of size 2. 
The coupled QMR algorithm is plotted with a solid line; it stagnated at 8.3E— 15, and it 
built 4 blocks of size 2 in the V-W sequence and 7 blocks of size 2 in the P-Q sequence. 
This behavior seems to be fairly typical, in that usually the new implementation is 
better than the original implementation, but the difference is not very large. 

However, there are cases where the coupled implementation is significantly better 
than the original QMR implementation. The next example is of this type. 

Example 9.2. This is a linear system that arises in performance modeling of mul- 
tiprocessor systems, using Petri net analysis. In such applications, one obtains large 
sparse singular matrices A with null spaces of dimension 1, and one needs to compute 
a nontrivial basis vector for this null space. This leads to a linear system of the form 

Ax = 0, 

and thus condition (2.1) is satisfied. We used a matrix A of size N = 3663 with 23397 
non-zero elements. The vector b was zero, while the initial guess x 0 and the starting 
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Fig. 9.2. Convergence curves for Example 9.2. 


vector w l were both random. The linear system is a difficult one and does not converge 
easily without preconditioning. We used the variant described in [9] of Saad’s ILUT 
preconditioner [17], with no additional fill-in allowed and a drop tolerance of 0.001, 
which generated a preconditioning matrix M with 23397 elements. The original QMR 
algorithm is plotted with a dotted line; it stagnated at 2.9E— 5, and it built 2 blocks of 
size 2. On the other hand, the new implementation, plotted with a solid line, stagnated 
at 1.2E— 12, and it built 1 block of size 2 in the V-W sequence and 3 blocks of size 2 
in the P-Q sequence. 

Example 9.3. Here A is the complex symmetric YOUNG1C matrix from the 
Harwell-Boeing test collection of sparse matrices [3]. The matrix arises in a scattering 
problem in aerodynamics research; it is of dimension N = 841 with 4089 non-zero ele- 
ments. We ran Algorithm 8.1, with various complex symmetric ILUT preconditioners. 
In all cases, the iteration was started with the same random vector for b and zero ini- 
tial guess x Q . This system is also a difficult one, and if not preconditioned, the QMR 
algorithm requires around 700 iterations to reach the stagnation level of 2.5E— 14; the 
corresponding convergence curve is plotted in a dashed line. However, the ILUT pre- 
conditioner is quite effective in this example, especially at higher levels of allowed fill-in 
and/or drop tolerance. The graph shows, in order of solid lines from right to left, ILUT 
with no additional fill-in and 0.001 drop tolerance (2375 non-zero elements), ILUT with 
5 additional fill-in and 0.001 drop tolerance (5171 non-zero elements), ILUT with 10 
additional fill-in and 0.001 drop tolerance (9320 non-zero elements), and finally ILUT 
with 16 additional fill-in and 0.0 drop tolerance (13329 non-zero elements). As can be 
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Fig. 9.3. Convergence curves for Example 9. 3. 


seen, all variants reach roughly the same stagnation level, around 1.0E— 14. However, 
they do so in fewer and fewer iterations, and in fact, for this example, the additional 
time spent computing the denser preconditioners was always made up by converging in 
fewer iterations. 


10. Concluding remarks. We have presented a new look-ahead algorithm for 
constructing Lanczos vectors based on coupled two-term recurrences instead of the 
usual three-term recurrences. We then discussed a new implementation of the quasi- 
minimal residual algorithm, using the coupled process to build the basis for the Krylov 
space. While the theoretical results derived for the original algorithm carry over to the 
new one, the latter was shown in examples to have better numerical properties. We 
also briefly covered an implementation of the new QMR method without look-ahead, 
as well as the application of the QMR algorithm to the solution of complex symmetric 
linear systems, where the underlying Lanczos process naturally simplifies. 

FORTRAN 77 codes for the proposed coupled-two term look-ahead procedure and 
the resulting new implementation of the QMR algorithm can be obtained electroni- 
cally from the authors (freund@research.att.com or namachtigal@na-net.ornl.gov). We 
note that FORTRAN 77 codes for the original implementation of QMR and the un- 
derlying look-ahead Lanczos algorithm are available from netlib by sending an email 
message consisting of the single line “send lalqmr from misc” to netlib@oml.gov or 
netlib@research.att.com. 
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